我们假设现有的句子级机器翻译(MT)指标在人类参考包含歧义时会效率降低。为了验证这一假设,我们提出了一种非常简单的方法,用于扩展预审计的指标以在文档级别合并上下文。我们将我们的方法应用于三个流行的指标,即Bertscore,Prism和Comet,以及无参考的公制Comet-QE。我们使用提供的MQM注释评估WMT 2021指标共享任务的扩展指标。我们的结果表明,扩展指标的表现在约85%的测试条件下优于其句子级别的级别,而在排除低质量人类参考的结果时。此外,我们表明我们的文档级扩展大大提高了其对话语现象任务的准确性,从而优于专用基线高达6.1%。我们的实验结果支持我们的初始假设,并表明对指标的简单扩展使他们能够利用上下文来解决参考中的歧义。
translated by 谷歌翻译
在这项工作中,我们详细描述了深度学习和计算机视觉如何帮助检测AirTender系统的故障事件,AirTender系统是售后摩托车阻尼系统组件。监测飞行员运行的最有效方法之一是在其表面上寻找油污渍。从实时图像开始,首先在摩托车悬架系统中检测到Airtender,然后二进制分类器确定Airtender是否在溢出油。该检测是在YOLO5架构的帮助下进行的,而分类是在适当设计的卷积神经网络油网40的帮助下进行的。为了更清楚地检测油的泄漏,我们用荧光染料稀释了荧光染料,激发波长峰值约为390 nm。然后用合适的紫外线LED照亮飞行员。整个系统是设计低成本检测设置的尝试。船上设备(例如迷你计算机)被放置在悬架系统附近,并连接到全高清摄像头框架架上。板载设备通过我们的神经网络算法,然后能够将AirTender定位并分类为正常功能(非泄漏图像)或异常(泄漏图像)。
translated by 谷歌翻译
Sockeye 3是神经机器翻译(NMT)的Mockeye工具包的最新版本。现在,基于Pytorch,Sockeye 3提供了更快的模型实现和更高级的功能,并具有进一步的简化代码库。这可以通过更快的迭代,对更强大,更快的模型进行有效的培训以及快速从研究转移到生产的新想法的灵活性,从而实现更广泛的实验。当运行可比较的型号时,Sockeye 3的速度比GPU上的其他Pytorch实现快126%,在CPU上的实现速度高达292%。Sockeye 3是根据Apache 2.0许可发布的开源软件。
translated by 谷歌翻译
自动配音(AD)是翻译应适合给定长度模板的用例之一,以实现源代理和目标语音之间的同步性。对于神经机翻译(MT),在源长度接近源长度(例如,在+ -10%内的字符数内)产生翻译,同时保持质量是一个具有挑战性的任务。控制NMT输出长度为翻译质量的成本,通常用两步的生成N-Best假设的方法来减轻,然后基于长度和质量重新排序它们。这项工作引入了一种自学方法,允许变压器模型直接学习生成与源长度紧密匹配的输出,短等距MT。特别地,我们对等距MT的方法不需要生成多个假设,也不需要任何辅助评分函数。我们向3名语言对(英语 - 法语,意大利语,德语,西班牙语)报告结果,该结果与基于TED谈话数据的公开可用的基准。自动和手动评估都表明,我们的自学习方法与更复杂的等距MT方法进行了执行。
translated by 谷歌翻译
我们介绍了Prosody-Aware Machine翻译的任务,旨在产生适合配音的翻译。配音是口语句要求将内容传输以及源的韵律结构转移到目标语言中以保留时序信息。实际上,这意味着从源暂停到目标并确保目标语音段具有大致相同的源片段的暂停。在这项工作中,我们提出了一种隐含和明确的建模方法,将韵律信息整合到神经机翻译中。英语 - 德语/法语与自动指标的实验表明,最简单的考虑方法最佳。结果是通过人类评估的翻译和配音视频确认。
translated by 谷歌翻译
基于纯粹关注的深度神经网络在几个领域中取得了成功,依赖于设计师的最小建筑前瞻性。在人类行动识别(HAR)中,主要是在标准卷积或复发层的顶部采用注意机制,从而提高了整体泛化能力。在这项工作中,我们介绍了动作变压器(ACT),这是一种简单的完全自我注意的架构,可以始终如一地优于混合卷积,复发和周度的更详细的网络。为了限制计算和能量请求,建立以前的人类行动识别研究,所提出的方法利用小型时间窗口的2D姿势表示,为准确且有效的实时性能提供低延迟解决方案。此外,我们开源MOMES2021是一个新的大规模数据集,作为建立正式培训和评估基准的实时短时哈哈。拟议的方法在MOMY2021上广泛测试,并与几个最先进的架构相比,证明了行为模型的有效性并铺设了未来工作的基础。
translated by 谷歌翻译
It is well known that conservative mechanical systems exhibit local oscillatory behaviours due to their elastic and gravitational potentials, which completely characterise these periodic motions together with the inertial properties of the system. The classification of these periodic behaviours and their geometric characterisation are in an on-going secular debate, which recently led to the so-called eigenmanifold theory. The eigenmanifold characterises nonlinear oscillations as a generalisation of linear eigenspaces. With the motivation of performing periodic tasks efficiently, we use tools coming from this theory to construct an optimization problem aimed at inducing desired closed-loop oscillations through a state feedback law. We solve the constructed optimization problem via gradient-descent methods involving neural networks. Extensive simulations show the validity of the approach.
translated by 谷歌翻译
Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.
translated by 谷歌翻译
Object instance segmentation is a key challenge for indoor robots navigating cluttered environments with many small objects. Limitations in 3D sensing capabilities often make it difficult to detect every possible object. While deep learning approaches may be effective for this problem, manually annotating 3D data for supervised learning is time-consuming. In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner. We introduce a zero-shot split for Tabletop Objects Dataset (TOD-Z) to enable this study and present a method that uses annotated objects to learn the ``objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments. Our method, SupeRGB-D, groups pixels into small patches based on geometric cues and learns to merge the patches in a deep agglomerative clustering fashion. SupeRGB-D outperforms existing baselines on unseen objects while achieving similar performance on seen objects. Additionally, it is extremely lightweight (0.4 MB memory requirement) and suitable for mobile and robotic applications. The dataset split and code will be made publicly available upon acceptance.
translated by 谷歌翻译
Increasingly taking place in online spaces, modern political conversations are typically perceived to be unproductively affirming -- siloed in so called ``echo chambers'' of exclusively like-minded discussants. Yet, to date we lack sufficient means to measure viewpoint diversity in conversations. To this end, in this paper, we operationalize two viewpoint metrics proposed for recommender systems and adapt them to the context of social media conversations. This is the first study to apply these two metrics (Representation and Fragmentation) to real world data and to consider the implications for online conversations specifically. We apply these measures to two topics -- daylight savings time (DST), which serves as a control, and the more politically polarized topic of immigration. We find that the diversity scores for both Fragmentation and Representation are lower for immigration than for DST. Further, we find that while pro-immigrant views receive consistent pushback on the platform, anti-immigrant views largely operate within echo chambers. We observe less severe yet similar patterns for DST. Taken together, Representation and Fragmentation paint a meaningful and important new picture of viewpoint diversity.
translated by 谷歌翻译